Hashtable code improvements by Explorer09 · Pull Request #1842 · htop-dev/htop

Explorer09 · 2025-12-24T21:24:14Z

Summary

This PR introduces four commits to improve Hashtable implementation, focusing on safer size boundaries, cleaner consistency checks, and a more compact lookup table representation.

Changes

Commit 1: Extends the OEISprimes[] table to include primes up to (2^64 - 59), enabling support for larger hashtable sizes.

Commit 2: Moves size-related assertions into Hashtable_isConsistent(), tightens grow/shrink conditions to support hashtable sizes up to SIZE_MAX / 7 without arithmetic overflow, introduces an overflow guard for cases where sizeof(HashtableItem) < 7, and makes Hashtable_setSize() private by changing it from public API to static.

Commit 3: Replaces the static OEISprimes lookup table with a compact primeDiffs-based calculation using OEIS A013603 (differences from 2^n to nearest prime ≤ 2^n), reducing memory footprint from ~512 bytes to ~64 bytes for a runtime-computed lookup model.

Additional modifications include:

Load-factor growth condition changed from items >= size / 2 to items >= size * 7 / 10
Growth strategy changed from doubling to incrementing by 1
Shrink trigger changed from 8 * items < size to items < size / 8
Shrink target changed from size / 3 to size * 3 / 8

Commit Structure

The PR contains a structural issue noted in review: Commit 3 (lookup table shrinking) should be split into a separate PR. The first three commits are logically independent and can be cherry-picked without the fourth; Commit 3 depends on Commits 1–2, but Commits 1–2 do not depend on Commit 3. The reviewer requested this separation to decouple the size-safety improvements from the optimization debate about precomputed vs. runtime-computed lookup tables.

Implementation Assessment

The solution is clean with appropriate invariant checks. However, there is unresolved discussion about the lookup table approach: trading a larger precomputed static table (~512 bytes) for simpler/faster runtime access (precomputation via macro) versus maintaining a smaller runtime-computed table (~64 bytes) to reduce binary footprint. The author prioritizes htop's compactness, but has offered to exclude the final commit if maintainers prefer precomputation, suggesting flexibility on the final decision.

Recommended Action

Merge Commits 1–2 as-is. Separate Commit 3 into its own PR to allow independent evaluation of the lookup table optimization strategy.

BenBE · 2025-12-25T00:06:55Z

-      if (SIZE_MAX / 2 < this->size)
-         CRT_fatalError("Hashtable: size overflow");
-
+   if (10 * this->items > 7 * this->size)


The only reason this is safe from overflows is that sizeof(HashtableItem) > 7 …
But I'm not sure, this can also necessarily be said about sizeof(HashtableItem) > 10, thus the multiplications here technically should be overflow-checked …

sizeof(HashtableItem) is currently 12 for 32-bit systems and 24 for 64-bit systems. Since htop doesn't support 16-bit systems, I really doubt there would be a case where sizeof(HashtableItem) can be less than 10.

I can add a sizeof(HashtableItem) > 10 assertion just to be safe.

Explorer09 · 2025-12-25T10:03:50Z

+      if (sizeof(HashtableItem) < 10 && SIZE_MAX / 10 < this->size)
         CRT_fatalError("Hashtable: size overflow");

      Hashtable_setSize(this, 2 * this->size);


I am tempted to adjust this line so that it becomes Hashtable_setSize(this, this->size + 1);

Since Hashtable.size is supposed to be a prime number close to a power of 2. There can be a case where multiplying the size by 2 skips an order of magnitude on the buffer size allocation. Example: (2^14 - 3) * 2 = 2^15 - 6 > 2^15 - 19, thus (2^15 - 19) might be skipped in the buffer size allocation.

I just doubt if this change is safe.

On first glance this should work. Not tested in detail though. But given that setSize rounds up to the next prime anyway this should likely work.

Explorer09 · 2025-12-25T17:08:27Z

Not sure if I can ask @cgzones a question:

Is it allowed for the assertion of this line be changed to assert(this->size >= this->items);?

In other words, what would happen if this->items == this->size?

There is a commit, b45eaf2, that changed the minimum size to 7, but the reason stated in that commit didn't fully make sense to me. While between 2 and 3, the grow factor ((3 - 2)/2 = 50%) is indeed less than 70%, but between 3 and 7, there'd be no problem with the grow factor ((7 - 3)/3 = 133%). The cause of the assertion error was more of an off-by-one from the conditional 10 * this->items > 7 * this->size. It should be 10 * (this->items + 1) > 7 * this->size instead if we have to satisfy the assertion this->size > this->items.

I am reluctant to add a +1 to the this->items conditional above, as I guess that the whole Hashtable structure should work fine if we allow this->items == this->size.

Given size is the number of allocted entries and items the number of actually used entries, the >= should be fine.

The reason for avoiding hash tables below 7 is efficiency: It doesn't make sense to allocate smaller blocks as most of our hash tables are far larger anyway.

@BenBE I know efficiency is a good reason, but I just want to write a good technical reason in the code comments. Especially that some numbers in the primeDiffs array (in my commit) will be intentionally unused.

Another (technical) reason is memory fragmentation. Allocating small blocks tends to fragment memory far more than using larger blocks (which for small collections might not even need resizing).

BenBE · 2025-12-26T20:06:37Z

How's the last commit related?

Explorer09 · 2025-12-26T21:11:43Z

How's the last commit related?

My improvement on the Hashtable code is to hope that the ht_key_t type can be upgraded from unsigned int to size_t, otherwise it would make no sense to support a Hashtable size of more than 2^32 entries.

The last commit might look distracting to the code improvement commits. I apologize. If the last commit would need more review, I'm happy to move it to a separate pull request.

BenBE · 2025-12-26T22:34:51Z

NP with the commit itself. Just wondered if they are complete. Also, do the current set of changes in the first 3 commits do work without the last one.

Explorer09 · 2025-12-27T00:55:33Z

NP with the commit itself. Just wondered if they are complete. Also, do the current set of changes in the first 3 commits do work without the last one.

I think the last commit needs some cleanup or discussion, but the first 3 commits are ready and can be cherry-picked to main early.

The last commit depends on the first 3 commits but the first 3 can work without the last.

BenBE · 2025-12-27T11:02:39Z

Can you split off the last commit into its own PR? TIA.

BenBE · 2025-12-27T22:51:08Z

+   if (this->items >= this->size * 7 / 10)
+      Hashtable_setSize(this, this->size + 1);


AFAICS the Hashtable_setSize should be called in either path to allocate new entries as needed.

What do you mean? This part of code addresses expanding the buffer. The buckets buffer is allocated stating at Hashtable_new.

Yes, but once the items get near the size because it can't allocate any more buffer space, you could at some point reach items == size, and thus the next insert will fail due to no more space allocated.

Instead when nearing the maximum capacity we should fall of to a more linear allocation regime …

@BenBE When it "can't allocate any more buffer space" htop will exit, because of the xCalloc call.

The case where items == size can only happen on small sizes such as 2 or 3 (the minimum size is 7 now, so the sizes of 2 and 3 are theoretical situations), but even when that happens, the next Hashtable_put call will always grow the buffer. Thus there's no problem here.

I was more thinking for very large allocations. Will have to take a closer look after New Year's …

Any harm in making the call to Hashtable_setSize unconditional? The above bounds check should be part of that function already.

I was thinking how that would affect the shrinking of the buffer, with respect to this code:

https://github.com/Explorer09/htop-1/blob/3dc65f62da56befae7ed7dcb6e66ca5bea856710/Hashtable.c#L292

I personally like the idea, by centralizing the conditionals that readjust the buffer size, we can save some sanity checks in the Hashtable_setSize function.

Update: It seems that there's a side effect if I try to merge the conditionals of expanding and shrinking the buffer in Hashtable_setSize, thus I have to give up on the idea.

When creating a Hashtable through Hashtable_new, it is allowed to specify a larger size for initial allocation. During the initial population of the items, this avoids unnecessary expansion or relocation of the buffer. If I move the shrinking condition to Hashtable_setSize, then the buffer will shrink automatically when adding an element to it. This would remove the benefits of initialing a Hashtable with larger size.

I think we arrived at this issue before … IIRC.

Maybe inhibit shrinking while we try to insert items …

Maybe inhibit shrinking while we try to insert items …

Inhibit shrinking means a flag argument in Hashtable_setSize. It seems like we cannot have less than 2 arguments for the setSize function. If we cannot reduce the number of arguments for it, then I'd like to keep the current function prototype, and use the size argument to determine whether the buffer should grow or shrink.

BenBE · 2026-02-20T18:40:33Z

@Explorer09 Any reason against pre-computing the table at compile-time?

#include <stdint.h>

#define B(i,a) (((uint64_t)1ull << (i)) - (a)),

#define A1(I, a0) \
    B((I)+0, a0)
#define A2(I, a0,a1) \
    A1((I)+0, a0) \
    A1((I)+1, a1)
#define A4(I, a0,a1,a2,a3) \
    A2((I)+0, a0,a1) \
    A2((I)+2, a2,a3)
#define A8(I, a0,a1,a2,a3,a4,a5,a6,a7) \
    A4((I)+0, a0,a1,a2,a3) \
    A4((I)+4, a4,a5,a6,a7)
#define A16(I, a0,a1,a2,a3,a4,a5,a6,a7,a8,a9,a10,a11,a12,a13,a14,a15) \
    A8((I)+0, a0,a1,a2,a3,a4,a5,a6,a7) \
    A8((I)+8, a8,a9,a10,a11,a12,a13,a14,a15)

static const uint64_t primeDiffs[] = {
    A16( 0, 0,0,1,1,3,1,3,1,5,3,3,9,3,1,3,19)
#if SIZE_MAX > UINT16_MAX
    A16(16, 15,1,5,1,3,9,3,15,3,39,5,39,57,3,35,1)
# if SIZE_MAX > UINT32_MAX
    A16(32, 5,9,41,31,5,25,45,7,87,21,11,57,17,55,21,115)
    A16(48, 59,81,27,129,47,111,33,55,5,13,27,55,93,1,57,25)
# endif
#endif
};

#undef A16
#undef A8
#undef A4
#undef A2
#undef A1
#undef B

Saves you from calculating things at runtime on every access to that table and the memory size isn't of much concern here.

FWIW, the code path isn't hot enough to gamble on keeping a single dcache line for the table contents.

Explorer09 · 2026-02-20T22:59:20Z

Explorer09 Any reason against pre-computing the table at compile-time?

Why pre-compute then? It was because the code path isn't a hot one that I proposed reducing the table size in memory. It's quite trivial to compute ((1ull << n) - A[n]). You don't reallocate the memory of Hashtables often.

The shrunk lookup table size could be just 64 bytes. If I expand and pre-compute the way you do, then it would become 512 bytes (64 * sizeof(uint64_t)) it's 8 times difference.

BenBE · 2026-02-20T23:08:46Z

Multiple reasons:

Reducing runtime code complexity (basically a follow-up to PR Allow for optimizing out bound check on 64 bit systems #1909)
No necessity to safe every last byte; in particular with static (constant) program memory.
Single-byte access is somewhat less efficient here (additional movzx compared to plain memory read)

Explorer09 · 2026-02-22T09:48:06Z

Multiple reasons:

Reducing runtime code complexity (basically a follow-up to PR Allow for optimizing out bound check on 64 bit systems #1909)

No necessity to safe every last byte; in particular with static (constant) program memory.

Single-byte access is somewhat less efficient here (additional movzx compared to plain memory read)

I don't like the reasoning, although I think such debate is going to be like arguing the color of a bikeshed, and thus I'm not going to change the decisions of the maintainers. You have the choice. I've rebased this PR so you can leave out the lookup table shrink.

Anyway, here's my motivation: My vision of htop is that it should be a compact tool for process monitoring even though it comes with a fancy, terminal UI. That means if there's any chance to reduce memory footprint for the htop program itself, I would wish it could be done (unless there's a larger performance tradeoff with reducing the code size). The memory is better left for large, server applications that need them. In my opinion, it's better for htop to be slightly slower (i.e. take slightly more CPU time) for certain tasks if doing them faster would consume memory that could be essential for server apps.

coderabbitai · 2026-05-17T06:34:28Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 86d08ac9-6795-4d4c-8e33-88f512000888

📥 Commits

Reviewing files that changed from the base of the PR and between cf0951a and 0df230b.

📒 Files selected for processing (2)

Hashtable.c
Hashtable.h

💤 Files with no reviewable changes (1)

Hashtable.h

📝 Walkthrough

Summary

This PR shrinks the nextPrime() lookup table in the Hashtable implementation by storing compact prime differences instead of full prime values.

Key Changes

Lookup Table Optimization:

Replaced precomputed OEISprimes table with compact primeDiffs uint8_t array (OEIS A013603)
Stores differences: 2^n - primeDiffs[n] yields primes ≤ 2^n, reducing memory from ~512 bytes to ~64 bytes
Table entries conditional on SIZE_MAX to omit unused entries on smaller platforms (16-bit, 32-bit systems)
Removes explicit (2^64 - 59) entry, consolidating logic

Defensive Assertions:

Hashtable_isConsistent() enforced invariants: size > 0, size <= SIZE_MAX / sizeof(HashtableItem), size >= items

API Simplification:

Hashtable_setSize() converted from public to static; removed from Hashtable.h

Capacity Growth/Shrink:

Growth trigger: items >= size * 7 / 10 (explicit load-factor)
Shrink trigger: items < size / 8; target size * 3 / 8 with nextPrime() rounding
Overflow guards specific to data structure size (sizeof(HashtableItem))
Early-exit when nextPrime() returns unchanged size

Design Rationale

The changes support upgrading ht_key_t from unsigned int to size_t, enabling Hashtable sizes beyond 2^32 entries while maintaining the tool's compact memory footprint. Runtime computation of (2^n - primeDiffs[n]) is negligible; the savings (448 bytes) better serve larger server applications than static program memory.

Lines changed: +37/-34 in Hashtable.c; +0/-2 in Hashtable.h

Note: This commit was separated from foundational capacity-logic commits at maintainer request, though it depends on those prior changes for correct functioning.

Walkthrough

This PR refactors hashtable resizing from table-driven primes to algorithmic prime generation, then internalizes the public Hashtable_setSize API and rewires growth/shrink heuristics to use incremental resizing with prime-rounded targets. The prime computation switches from a precomputed OEISprimes array to a primeDiffs-based bit-shift calculation. Growth now triggers at 70% load and increments size by 1 (relying on nextPrime rounding), while shrink triggers at 12.5% fullness and targets 37.5% of current size. New overflow guards keyed to sizeof checks protect against arithmetic overflow in specific branches.

Poem

Table primes now compute on the fly,
No lookup array needed—bits shift high,
Static setSize rounds the way,
Growing by one, shrinking by eight,
Each resize finds its perfect state.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

* Move assertions about hash table sizes to Hashtable_isConsistent() so they can be checked in all Hashtable methods. * Slightly improve conditionals of growing and shrinking the "buckets" buffer. Specifically the calculations are now less prone to arithmetic overflow and can work with Hashtable.size value up to (SIZE_MAX / 7). (Original limit was (SIZE_MAX / 10)). * If `Hashtable.size > SIZE_MAX / sizeof(HashtableItem)`, allow the compiler to optimize out one conditional of checking overflow. (The buffer allocation would still fail at xCalloc() in that case.) * Hashtable_setSize() is now a private method. Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

The lookup table now codes the difference between 2^n to the nearest prime not greater than 2^n (i.e. https://oeis.org/A013603 ). With the change of the lookup table, (2^64 - 59) has been removed. It is believed that such removal won't cause practical problems as the number is very close to SIZE_MAX and a system is unlikely to succeed in allocating a memory block _that_ huge. Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

Explorer09 changed the title ~~Hashtable code shrink~~ Hashtable code improvements Dec 24, 2025

BenBE added enhancement Extension or improvement to existing feature code quality ♻️ Code quality enhancement labels Dec 24, 2025

BenBE requested changes Dec 25, 2025

View reviewed changes

Explorer09 force-pushed the hashtable-primes branch 3 times, most recently from fabbd8e to 3870132 Compare December 25, 2025 09:54

Explorer09 commented Dec 25, 2025

View reviewed changes

Explorer09 force-pushed the hashtable-primes branch from 3870132 to daf0292 Compare December 26, 2025 19:17

Explorer09 force-pushed the hashtable-primes branch 5 times, most recently from be3e2f7 to 493c3cb Compare December 26, 2025 21:05

Explorer09 force-pushed the hashtable-primes branch from 493c3cb to daf0292 Compare December 27, 2025 16:36

BenBE reviewed Dec 27, 2025

View reviewed changes

Explorer09 force-pushed the hashtable-primes branch 2 times, most recently from 7cf3e2a to 3dc65f6 Compare January 1, 2026 18:57

Explorer09 force-pushed the hashtable-primes branch 5 times, most recently from 80c8562 to ed56798 Compare January 12, 2026 04:16

Explorer09 force-pushed the hashtable-primes branch from ed56798 to 9d6d20a Compare January 28, 2026 06:29

Explorer09 force-pushed the hashtable-primes branch from 9f35f81 to 88db60c Compare February 22, 2026 09:24

Explorer09 force-pushed the hashtable-primes branch 2 times, most recently from 4e01e6e to e201bcb Compare February 28, 2026 18:20

Explorer09 force-pushed the hashtable-primes branch from e201bcb to 158bed2 Compare March 9, 2026 17:33

Explorer09 force-pushed the hashtable-primes branch from 158bed2 to 84d4b7b Compare March 18, 2026 11:13

Explorer09 force-pushed the hashtable-primes branch 2 times, most recently from 8b2ff8a to c322974 Compare April 1, 2026 16:28

Explorer09 force-pushed the hashtable-primes branch 3 times, most recently from d03c955 to b46d41a Compare April 10, 2026 05:45

Explorer09 force-pushed the hashtable-primes branch 4 times, most recently from 5f4949c to 9399923 Compare April 19, 2026 17:52

Explorer09 force-pushed the hashtable-primes branch 3 times, most recently from 11b5401 to 4f4f2dc Compare May 1, 2026 10:01

Explorer09 force-pushed the hashtable-primes branch 3 times, most recently from 84ab45f to 56fd4d4 Compare May 11, 2026 04:08

Explorer09 force-pushed the hashtable-primes branch from 56fd4d4 to cf0951a Compare May 17, 2026 06:34

Explorer09 added 3 commits June 16, 2026 13:26

Hashtable: Extend OEISprimes[] to up to (2^64 - 59)

75cf233

Signed-off-by: Kang-Che Sung <explorer09@gmail.com>

Explorer09 force-pushed the hashtable-primes branch from cf0951a to 0df230b Compare June 16, 2026 05:31

		if (this->items >= this->size * 7 / 10)
		Hashtable_setSize(this, this->size + 1);

Uh oh!

Uh oh!

Conversation

Explorer09 commented Dec 24, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Commit Structure

Implementation Assessment

Recommended Action

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenBE commented Dec 26, 2025

Uh oh!

Explorer09 commented Dec 26, 2025

Uh oh!

BenBE commented Dec 26, 2025

Uh oh!

Explorer09 commented Dec 27, 2025

Uh oh!

BenBE commented Dec 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Explorer09 Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Explorer09 Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenBE Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenBE commented Feb 20, 2026

Uh oh!

Explorer09 commented Feb 20, 2026

Uh oh!

BenBE commented Feb 20, 2026

Uh oh!

Explorer09 commented Feb 22, 2026

Uh oh!

coderabbitai Bot commented May 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

Design Rationale

Explorer09 commented Dec 24, 2025 •

edited by coderabbitai Bot

Loading

Explorer09 Jan 10, 2026 •

edited

Loading

Explorer09 Jan 11, 2026 •

edited

Loading

BenBE Jan 12, 2026 •

edited

Loading

coderabbitai Bot commented May 17, 2026 •

edited

Loading